import pandas as pd
import altair as alt
import datetime
url = "https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_data/main/building_inventory.csv"
data = pd.read_csv(url)
alt.data_transformers.disable_max_rows()
data['Year Constructed'] = data['Year Constructed'].fillna(data['Year Constructed'].median())
data['Square Footage'] = data['Square Footage'].fillna(data['Square Footage'].median())
current_year = datetime.datetime.now().year
data['building_age'] = current_year - data['Year Constructed']
scatter_plot = alt.Chart(data).mark_circle().encode(
x='building_age',
y='Square Footage',
color='Agency Name',
tooltip=['building_age', 'Square Footage', 'Agency Name']
).properties(
title='Building Age vs. Square Footage'
).interactive()
scatter_plot
This scatter plot visualizes the relationship between building age and square footage. Each point represents a building, with its position determined by the building's age (calculated from the year it was constructed) and its square footage. The goal is to understand if there is a relationship between a building's size and its age.
X-axis (building_age): Represents the age of the building, calculated as the difference between the current year and the construction year. It’s numerical. Y-axis (Square Footage): Represents the size of the building. It is a quantitative value. Color (Agency Name): The data points are colored based on facility type to distinguish between different usage types. Tooltip: Tooltips are used to provide more information when hovering over points, displaying the building age, square footage, and agency name.
Created a new column called building_age by calculating the difference between the current year and the construction year (Year Constructed). Replaced missing values (NaN) in Year Constructed with the median value to ensure all buildings had a valid age. Replaced NaN values in Square Footage with the median value as well, ensuring consistent data for plotting.
data['Usage Description'] = data['Usage Description'].fillna('Unknown')
usage_count = data.groupby('Usage Description').size().reset_index(name='count')
bar_chart = alt.Chart(usage_count).mark_bar().encode(
x='count',
y=alt.Y('Usage Description', sort='-x'),
color='Usage Description',
tooltip=['Usage Description', 'count']
).properties(
title='Distribution of Building Types in the Dataset'
).interactive()
bar_chart
This horizontal bar chart displays the distribution of building types in the dataset. Each bar represents a specific facility type, and its length indicates the count of buildings for that category. This chart helps to understand the relative abundance of different types of buildings within the dataset.
X-axis (count): Represents the number of buildings for each facility type. Y-axis (Usage Description): Represents the different facility types (e.g., the usage description of each building). Sorting: The bars are sorted by count in descending order, which makes it easier to identify the most and least common facility types. Color (Usage Description): The bars are color-coded by usage description, which helps distinguish between the different categories visually. Tooltip: Tooltips are used to provide more detailed information, including the usage description and the count of buildings, when hovering over the bars.
Aggregation: Grouped the data by Usage Description to get the count of each facility type. Replaced missing values in Usage Description with a placeholder ('Unknown') to ensure that every building had a valid category for the chart.